species distribution model
CISO: Species Distribution Modeling Conditioned on Incomplete Species Observations
Abdelwahed, Hager Radi, Teng, Mélisande, Zbinden, Robin, Pollock, Laura, Larochelle, Hugo, Tuia, Devis, Rolnick, David
Species distribution models (SDMs) are widely used to predict species' geographic distributions, serving as critical tools for ecological research and conservation planning. Typically, SDMs relate species occurrences to environmental variables representing abiotic factors, such as temperature, precipitation, and soil properties. However, species distributions are also strongly influenced by biotic interactions with other species, which are often overlooked. While some methods partially address this limitation by incorporating biotic interactions, they often assume symmetrical pairwise relationships between species and require consistent co-occurrence data. In practice, species observations are sparse, and the availability of information about the presence or absence of other species varies significantly across locations. To address these challenges, we propose CISO, a deep learning-based method for species distribution modeling Conditioned on Incomplete Species Observations. CISO enables predictions to be conditioned on a flexible number of species observations alongside environmental variables, accommodating the variability and incompleteness of available biotic data. We demonstrate our approach using three datasets representing different species groups: sPlotOpen for plants, SatBird for birds, and a new dataset, SatButterfly, for butterflies. Our results show that including partial biotic information improves predictive performance on spatially separate test sets. When conditioned on a subset of species within the same dataset, CISO outperforms alternative methods in predicting the distribution of the remaining species. Furthermore, we show that combining observations from multiple datasets can improve performance. CISO is a promising ecological tool, capable of incorporating incomplete biotic information and identifying potential interactions between species from disparate taxa.
- North America > United States > Montana (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States > Nevada (0.04)
- (4 more...)
MaskSDM with Shapley values to improve flexibility, robustness, and explainability in species distribution modeling
Zbinden, Robin, van Tiel, Nina, Sumbul, Gencer, Vanalli, Chiara, Kellenberger, Benjamin, Tuia, Devis
Species Distribution Models (SDMs) play a vital role in biodiversity research, conservation planning, and ecological niche modeling by predicting species distributions based on environmental conditions. The selection of predictors is crucial, strongly impacting both model accuracy and how well the predictions reflect ecological patterns. To ensure meaningful insights, input variables must be carefully chosen to match the study objectives and the ecological requirements of the target species. However, existing SDMs, including both traditional and deep learning-based approaches, often lack key capabilities for variable selection: (i) flexibility to choose relevant predictors at inference without retraining; (ii) robustness to handle missing predictor values without compromising accuracy; and (iii) explainability to interpret and accurately quantify each predictor's contribution. To overcome these limitations, we introduce MaskSDM, a novel deep learning-based SDM that enables flexible predictor selection by employing a masked training strategy. This approach allows the model to make predictions with arbitrary subsets of input variables while remaining robust to missing data. It also provides a clearer understanding of how adding or removing a given predictor affects model performance and predictions. Additionally, MaskSDM leverages Shapley values for precise predictor contribution assessments, improving upon traditional approximations. We evaluate MaskSDM on the global sPlotOpen dataset, modeling the distributions of 12,738 plant species. Our results show that MaskSDM outperforms imputation-based methods and approximates models trained on specific subsets of variables. These findings underscore MaskSDM's potential to increase the applicability and adoption of SDMs, laying the groundwork for developing foundation models in SDMs that can be readily applied to diverse ecological applications.
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Maryland (0.04)
- (7 more...)
Spatial Clustering of Citizen Science Data Improves Downstream Species Distribution Models
Ahmed, Nahian, Roth, Mark, Hallman, Tyler A., Robinson, W. Douglas, Hutchinson, Rebecca A.
Citizen science biodiversity data present great opportunities for ecology and conservation across vast spatial and temporal scales. However, the opportunistic nature of these data lacks the sampling structure required by modeling methodologies that address a pervasive challenge in ecological data collection: imperfect detection, i.e., the likelihood of under-observing species on field surveys. Occupancy modeling is an example of an approach that accounts for imperfect detection by explicitly modeling the observation process separately from the biological process of habitat selection. This produces species distribution models that speak to the pattern of the species on a landscape after accounting for imperfect detection in the data, rather than the pattern of species observations corrupted by errors. To achieve this benefit, occupancy models require multiple surveys of a site across which the site's status (i.e., occupied or not) is assumed constant. Since citizen science data are not collected under the required repeated-visit protocol, observations may be grouped into sites post hoc. Existing approaches for constructing sites discard some observations and/or consider only geographic distance and not environmental similarity. In this study, we compare ten approaches for site construction in terms of their impact on downstream species distribution models for 31 bird species in Oregon, using observations recorded in the eBird database. We find that occupancy models built on sites constructed by spatial clustering algorithms perform better than existing alternatives.
- North America > United States > Ohio > Lucas County > Oregon (0.04)
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- (2 more...)
Hybrid Spatial Representations for Species Distribution Modeling
We address an important problem in ecology called Species Distribution Modeling (SDM), whose goal is to predict whether a species exists at a certain position on Earth. In particular, we tackle a challenging version of this task, where we learn from presence-only data in a community-sourced dataset, model a large number of species simultaneously, and do not use any additional environmental information. Previous work has used neural implicit representations to construct models that achieve promising results. However, implicit representations often generate predictions of limited spatial precision. We attribute this limitation to their inherently global formulation and inability to effectively capture local feature variations. This issue is especially pronounced with presence-only data and a large number of species. To address this, we propose a hybrid embedding scheme that combines both implicit and explicit embeddings. Specifically, the explicit embedding is implemented with a multiresolution hashgrid, enabling our models to better capture local information. Experiments demonstrate that our results exceed other works by a large margin on various standard benchmarks, and that the hybrid representation is better than both purely implicit and explicit ones. Qualitative visualizations and comprehensive ablation studies reveal that our hybrid representation successfully addresses the two main challenges. Our code is open-sourced at https://github.com/Shiran-Yuan/HSR-SDM.
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
cito: An R package for training neural networks using torch
Amesoeder, Christian, Hartig, Florian, Pichler, Maximilian
Deep Neural Networks (DNN) have become a central method in ecology. Most current deep learning (DL) applications rely on one of the major deep learning frameworks, in particular Torch or TensorFlow, to build and train DNN. Using these frameworks, however, requires substantially more experience and time than typical regression functions in the R environment. Here, we present 'cito', a user-friendly R package for DL that allows specifying DNNs in the familiar formula syntax used by many R packages. To fit the models, 'cito' uses 'torch', taking advantage of the numerically optimized torch library, including the ability to switch between training models on the CPU or the graphics processing unit (GPU) (which allows to efficiently train large DNN). Moreover, 'cito' includes many user-friendly functions for model plotting and analysis, including optional confidence intervals (CIs) based on bootstraps for predictions and explainable AI (xAI) metrics for effect sizes and variable importance with CIs and p-values. To showcase a typical analysis pipeline using 'cito', including its built-in xAI features to explore the trained DNN, we build a species distribution model of the African elephant. We hope that by providing a user-friendly R framework to specify, deploy and interpret DNN, 'cito' will make this interesting model class more accessible to ecological data analysis. A stable version of 'cito' can be installed from the comprehensive R archive network (CRAN).
- Europe > Germany > Bavaria > Regensburg (0.05)
- Africa > Kenya (0.04)
- North America > United States > New York (0.04)
On the selection and effectiveness of pseudo-absences for species distribution modeling with deep learning
Zbinden, Robin, van Tiel, Nina, Kellenberger, Benjamin, Hughes, Lloyd, Tuia, Devis
Species distribution modeling is a highly versatile tool for understanding the intricate relationship between environmental conditions and species occurrences. However, the available data often lacks information on confirmed species absence and is limited to opportunistically sampled, presence-only observations. To overcome this limitation, a common approach is to employ pseudo-absences, which are specific geographic locations designated as negative samples. While pseudo-absences are well-established for single-species distribution models, their application in the context of multi-species neural networks remains underexplored. Notably, the significant class imbalance between species presences and pseudo-absences is often left unaddressed. Moreover, the existence of different types of pseudo-absences (e.g., random and target-group background points) adds complexity to the selection process. Determining the optimal combination of pseudo-absences types is difficult and depends on the characteristics of the data, particularly considering that certain types of pseudo-absences can be used to mitigate geographic biases. In this paper, we demonstrate that these challenges can be effectively tackled by integrating pseudo-absences in the training of multi-species neural networks through modifications to the loss function. This adjustment involves assigning different weights to the distinct terms of the loss function, thereby addressing both the class imbalance and the choice of pseudo-absence types. Additionally, we propose a strategy to set these loss weights using spatial block cross-validation with presence-only data. We evaluate our approach using a benchmark dataset containing independent presence-absence data from six different regions and report improved results when compared to competing approaches.
- North America > United States (0.14)
- North America > Canada (0.04)
- South America (0.04)
- (5 more...)
Species Distribution Models with GIS & Machine Learning in R
Machine Learning Models for Habitat Suitability - Implement and interpret common ML techniques to build habitat suitability maps for the birds of Peninsular Malaysia. It is a practical, hands-on course, i.e. we will spend some time dealing with some of the theoretical concepts . However, majority of the course will focus on implementing different techniques on real data and interpret the results. After each video you will learn a new concept or technique which you may apply to your own projects.
- Asia > Malaysia (0.27)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Species Distribution Models with GIS & Machine Learning in R
Are You an Ecologist or Conservationist Interested in Learning GIS and Machine Learning in R? Then this course is for you! I will take you on an adventure into the amazing of field Machine Learning and GIS for ecological modelling. You will learn how to implement species distribution modelling/map suitable habitats for species in R. My name is MINERVA SINGH and i am an Oxford University MPhil (Geography and Environment) graduate. I finished a PhD at Cambridge University (Tropical Ecology and Conservation). I have several years of experience in analyzing real life spatial data from different sources and producing publications for international peer reviewed journals.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.26)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.26)
- Asia > Malaysia (0.06)
- Education > Educational Technology > Educational Software > Computer Based Training (0.44)
- Education > Educational Setting > Online (0.44)
Species Distribution Models with GIS & Machine Learning in R
Are You an Ecologist or Conservationist Interested in Learning GIS and Machine Learning in R? Then this course is for you! I will take you on an adventure into the amazing of field Machine Learning and GIS for ecological modelling. You will learn how to implement species distribution modelling/map suitable habitats for species in R. My name is MINERVA SINGH and i am an Oxford University MPhil (Geography and Environment) graduate. I finished a PhD at Cambridge University (Tropical Ecology and Conservation). I have several years of experience in analyzing real life spatial data from different sources and producing publications for international peer reviewed journals.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.26)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.26)
- Asia > Malaysia (0.06)
- Education > Educational Technology > Educational Software > Computer Based Training (0.44)
- Education > Educational Setting > Online (0.44)
Bioclimating Modelling: A Machine Learning Perspective
Many machine learning (ML) approaches are widely used to generate bioclimatic models for prediction of geographic range of organism as a function of climate. Applications such as prediction of range shift in organism, range of invasive species influenced by climate change are important parameters in understanding the impact of climate change. However, success of machine learning-based approaches depends on a number of factors. While it can be safely said that no particular ML technique can be effective in all applications and success of a technique is predominantly dependent on the application or the type of the problem, it is useful to understand their behaviour to ensure informed choice of techniques. This paper presents a comprehensive review of machine learning-based bioclimatic model generation and analyses the factors influencing success of such models. Considering the wide use of statistical techniques, in our discussion we also include conventional statistical techniques used in bioclimatic modelling.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- (15 more...)
- Research Report (1.00)
- Overview (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.48)
- (2 more...)